Active Cleaning for Video Corpus Annotation

نویسندگان

Bahjat Safadi

Stéphane Ayache

Georges Quénot

چکیده

In this paper, we have described the active cleaning approach that was used to complement the active learning approach in the TRECVID collaborative annotation. It consists in using a classification system in order to select the most informative samples for multiple annotations, in order to improve the quality and the reliability of the annotations. We have evaluated the actual impact of the active cleaning approach on TRECVID 2007 collection. The evaluations were conducted using complete annotations that were collected from different resources, including the TRECVID collaborative annotations and the MCG-ICT-CAS annotations. From our experiments, a significant improvement of the annotation quality was observed when applying the cleaning by cross-validation strategy, which selects the samples to be re-annotated. Experiments show that higher performance can be reached with a double annotations of 10% of negative samples or 5% of all the annotated samples selected by the proposed cleaning strategy using crossvalidation. It has been shown that, with an appropriate strategy, using a small fraction of the annotations for cleaning improves much more the system’s performance than using the same fraction for adding more annotations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Annotation of Clinical Narratives in Bulgarian language

In this paper we describe annotation process of clinical texts with morphosyntactic and semantic information. The corpus contains 1,300 discharge letters in Bulgarian language for patients with Endocrinology and Metabolic disorders. The annotated corpus will be used as a Gold standard for information extraction evaluation of test corpus of 6,200 discharge letters. The annotation is performed wi...

متن کامل

Indexation sémantique des images et des vidéos par apprentissage actif. (Semantic indexing of images and videos by active learning)

The general framework of this thesis is semantic indexing and information retrieval, applied to multimedia documents. More specifically, we are interested in the semantic indexing of concepts in images and videos by the active learning approaches that we use to build annotated corpus. Through out this thesis, we have shown that the main difficulties of this task are often related, in general, t...

متن کامل

Introducing the Reference Corpus of Contemporary Portuguese Online

We present our work in processing the Reference Corpus of Contemporary Portuguese and its publication online. After discussing how the corpus was built and our choice of meta-data, we turn to the processes and tools involved for the cleaning, preparation and annotation to make the corpus suitable for linguistic inquiries. The Web platform is described, and we show examples of linguistic resourc...

متن کامل

7x1-PT: um Corpus extraído do Twitter para Análise de Sentimentos em Língua Portuguesa (7x1-PT: a Corpus extracted from Twitter for Sentiment Analysis in Portuguese Language)

This paper describes the 7x1PT corpus that contains a set of tweets, in Portuguese, posted during the match Germany vs Brazil at the FIFA World Cup 2014. We describe data collection, cleaning and organization, and also the current stage of the linguistic annotation of this corpus.

متن کامل

A Large Portuguese Corpus On-Line: Cleaning and Preprocessing

We present a newly available on-line resource for Portuguese, a corpus of 310 million words, a new version of the Reference Corpus of Contemporary Portuguese, now searchable via a user-friendly web interface. Here we report on work carried out on the corpus previous to its publication on-line. We focus on the processes and tools involved for the cleaning, preparation and annotation to make the ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2012

Active Cleaning for Video Corpus Annotation

نویسندگان

چکیده

منابع مشابه

Annotation of Clinical Narratives in Bulgarian language

Indexation sémantique des images et des vidéos par apprentissage actif. (Semantic indexing of images and videos by active learning)

Introducing the Reference Corpus of Contemporary Portuguese Online

7x1-PT: um Corpus extraído do Twitter para Análise de Sentimentos em Língua Portuguesa (7x1-PT: a Corpus extracted from Twitter for Sentiment Analysis in Portuguese Language)

A Large Portuguese Corpus On-Line: Cleaning and Preprocessing

عنوان ژورنال:

اشتراک گذاری